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ABSTRACT 


The  Texas  Instilments  Advanced  Scientific  Conputer 
is  a very  powerful,  very  fast,  large  central  memory 
computer  for  scientific  applications  programs.  Large 
programs  such  as  NASTRAN,  which  perform  extensive 
numerical  processing  of  large  matrices,  should  require 
significantly  less  central  processor  time  when  run  on 
such  a conputer.  A feasibility  study  was  carried  out 
which  addressed  the  possibility  of  running  NASTRAN  in 
a split  mode  an  the  Texas  Instruments  Advanced  Scientific 
Computer  at  the  Naval  Research  Laboratory  and  on  the  CDC 
6000  at  DTNSRDC,  and  included  experiments  on  the  TIASC 
computer. 

The  major  problem  area  found  by  the  feasibility  study 
involves  transfer  of  data  between  the  two  cooputers, 
since  the  availability  of  a reliable,  cheap,  and  fast  ' 
means  of  transfer  of  very  large  amounts  of  data  is  not 
a certainty.  If  this  problem  can  be  overcame,  the 
advantages  of  such  a split  mode  operation,  with  respect 
to  cost  and  resources,  could  be  significant.  The  transfer 
of  data  will  be  acconplished  by  the  Navy  Laboratory 
Conputer  Network  (NALOON) , which  will  tie  together  the 
computers  of  ten  Navy  laboratories.  A file  transfer 
protocol  which  would  allow  binary  data  transfer  between 
these  two  canputers,  thereby  reducing  the  amount  of  data 
processing  required,  is  recomaended  as  part  of  NALGGN. 

In  addition,  the  standard  NASTRAN  matrix  deconposition 
code  is  not  cost  effective  on  the  Texas  Instruments 
Advanced  Scientific  Conputer.  The  matrix  deconposition 
code  and  the  input/output  code  associated  with  it  should 
be  replaced  to  make  a split  mode  operation  cost  effective. 
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INTRODUCTION 


The  Naval  Research  Laboratory  (NRL)  is  presently  performing 
acceptance  tests  on  their  very  powerful,  very  fast,  Texas  Instruments 
Advanced  Scientific  Conputer  (TIASC).  This  computer  derives  its 
increased  speed  principally  from  a pipeline  processing  concept  and  is 
expected  to  have  a capability  similar  to  the  IBM  360/91  but  with  a speed 
enhancement  of  approximately  four.  The  TIASC  at  NRL  currently  has  a 
central  memory  of  four  million  bytes,  or  one  million  32 -bit  words.  Both 
central  memory  size  and  speed  are  expected  to  be  increased  in  the  future 
In  comparison,  the  CDC  6000s  at  DTNSRDC  have  a maxinun  central  memory  of 
131,072  60-bit  woids. 

The  TIASC  computer  should  significantly  reduce  computational  cost 
for  the  extensive  numerical  processing  of  large  matrices  often  encountered 
by  computer  programs  such  as  NASTRAN.  Therefore,  a feasibility  study  was 
undertaken  to  explore  the  possibility  of  taking  advantage  of  this  new 
computational  facility  by  sharing  a NASTRAN  job  between  DTNSRDC' s CDC 
6000  computer  and  the  TIASC,  which,  along  with  the  computers  at  eight 
other  Navy  laboratories,  will  be  tied  together  through  the  Navy 
Laboratory  Computer  Network  (NALOON) . The  major  mathematical  conputations 
would  be  performed  an  the  TIASC,  and  the  pre-  and  postprocessing  for  those 
computations  would  be  handled  by  the  CDC  6000.  If  such  a procedure 
proved  theoretically  feasible,  experimentation  with  the  TIASC  capabilities 
would  follow. 

Part  I of  this  report  documents  the  feasibility  of  running  NASTRAN 
in  such  a split  mode.  Part  II  contains  the  results  of  the  experimentation 
on  the  TIASC.  It  details  central  processor  (CP)  time  comparisons, 
problems  encountered,  and  reconeendatians  both  for  NASTRAN  and  for  other 
FORTRAN  computer  codes  on  the  TIASC. 


PART  I:  FEASIBILITY  STUDY 
by 

Myles  M.  Hurwitz 

Specifically  addressed  were  the  problems,  costs,  advantages,  etc.  of 
starting  a NASTRAN  analysis  on  the  CDC  6000  at  DTNSRDC,  transferring 
large  matrices  to  the  ASC  for  "number-crunching"  processing,  transferring 
the  results  back  to  the  CDC  conputer,  and  restarting  the  NASTRAN  analysis. 
The  processing  may  include  such  operations  as  matrix  decomposition  or 
eigenvalue  extraction. 

The  following  questions  were  considered.  The  answers  given,  with 
underlying  assumptions,  are  based  on  work  performed  through  30  June  1976. 

Q.  The  data  transfer  and  cost-effectiveness  aspects  of  the  entire 
operation  become  meaningless  unless  the  required  matrices  can  be 
extracted  easily  from  the  program,  the  results  inserted  back  into  the 
program,  and  the  program  restarted  at  the  breakpoint.  Is  this  possible? 

A.  Yes,  because  of  NASTRAN' s checkpoint/restart,  ALTER,  and  user  tape 
capabilities.  The  program  can  be  started  with  its  checkpoint  facility 
operational  and  can  be  terminated  just  after  the  required  matrices  are 
transferred  to  user  tapes  using  the  ALTER  capability.  After  the 
required  operations  on  these  matrices  have  been  conpleted  by  the  ASC,  the 
results  may  be  stored  on  user  tapes,  and  NASTRAN  may  be  restarted  just 
after  the  previous  termination  point.  (The  equivalent  NASTRAN  operations 
are  skipped  and  the  ASC  results  are  read  in  by  the  ALTER  facility.) 

NASTRAN  will  then  execute  to  completion  normally. 

Q.  Can  the  data  be  transferred  from  the  CDC  6000  to  the  ASC? 

A.  Yes,  but  a number  of  operations  may  have  to  be  performed  to  effect  a 
meaningful  transfer.  As  presently  planned,  a file  transfer  capability 
will  be  implemented  on  the  Navy  Laboratory  Computer  Network  (NALCON) , to 
which  DTNSRDC' s CDC  6000  computers  and  NRL's  TIASC  computer  will  be 
connected.  Such  a capability  is  scheduled  after  teiminal  access  becomes 
operational  on  NALCON.  (Remote  job  entry  (RJE)  will  be  a third 
capability.)  A major  question,  however,  concerns  the  type  of  file  that 
can  be  transferred.  There  should  be  no  problem  in  transferring  ASCII 
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files,  but  transferring  binary  files,  as  one  would  like  to  do  both  as  a 
convenience  and  as  a cost -saver,  will  not  be  possible  unless  an  acceptable 
file  transfer  protocol  (FTP)  is  implemented.  The  present  outlook  for  such 
an  FTP  is  not  optimistic.  If  it  is  not  inplemented,  the  NASTRAN  binary 
information,  stored  on  the  user  tapes,  will  have  to  be  converted  to  ASCII 
format.  Some  preliminary  tests  indicate  that  converting  one  million 
lumbers  will  use  approximately  90  CPU  seconds  on  the  CDC  6000.  The  one 
million  figure  was  obtained  by  assigning  a square  matrix  of  order  10,000 
and  one  percent  dense,  i.e. , only  one  percent  of  the  one  hundred  million 
numbers  are  non -zero.  This  percentage  is  typical  of  large  NASTRAN 
matrices  resulting  from  a standard  structural  analysis.  Unfortunately, 
such  a conversion  will  have  to  be  performed  four  times,  twice  on  each 
machine,  once  binary  to  ASCII,  once  ASCII  to  binary. 

Q.  How  long ,will  it  take  to  transfer  the  data  and  how  nuch  will  it  cost? 

A.  If  we  assume  that  the  standard  wide-band  telephone  line  will  be  the 
mode  of  cornnunication,  a dedicated  50,000-bit/second  line  will  require 
approximately  1200  seconds  to  transmit  one  million  numbers,  sixty  bits 
per  number.  Unfortunately,  since  a sixty-bit  binary  number  may  have  to 
be  converted  to  seventy-two  ASCII  bits,  and,  since  a dedicated  telephone 
line  is  not  probable,  the  time  for  transmission  may  increase  to  as  much 
as  3600  seconds,  or  one  hour,  assuming  an  effective  transmission  rate  of 
20,000  bits  per  second  as  suggested  by  ARPANET  studies.  Ihese  figures 
are  for  one-way  transmission  only.  Naturally,  the  results  have  to  be 
transferred  back  to  DTNSRDC.  The  cost  of  transmitting  the  data  is 
presently  unknown.  The  cost  of  using  NALG0N  is  currently  expected  to 
involve  a surcharge  over  standard  computer  costs.  However,  with  telephone 
line  charges  increasing  rapidly,  especially  in  comparison  to  CPU  rates,  a 
surcharge  may  not  be  the  final  method  of  distributing  costs. 

If  any  aspect  of  the  entire  operation  could  cause  the  operation  to 
be  considered  unfeasible,  it  would  be  the  cost  and/or  time  of  transmission 
rather  than  any  technological  aspect  of  the  transfer.  One  possibility 
which  could  overcome  these  problems  is  microwave  transfer  rather  than 
telephone  line  transfer.  However,  the  use  of  such  a system  would  have  to  • 
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be  significant  to  justify  the  large  initial  cost  of  implementing  it. 

Q.  Are  there  any  other  problems  with  the  transfer  of  the  data? 

A.  Yes,  but  they  are  probably  relatively  minor.  Two  exanples  are: 

(1)  the  single-precision  60-bit  CDC  word  will  have  to  be  converted  to 
two  32-bit  ASC  words,  i.e.,  one  double-precision  word;  (2)  NASTRAN  packs 
its  matrices  in  such  a way  that  zeroes  are  not  stored.  If  the  ASC 
routines  performing  the  ’’number-crunching"  do  not  recognize  such  packing, 
the  matrices  will  have  to  be  unpacked,  operated  on,  and  packed  again 
before  the  information  can  be  transferred  back  to  the  CDC  computer. 

Q.  Once  the  data  have  been  transferred  to  the  ASC,  what  software  will  be 
available  to  operate  on  the  data? 

A.  Texas  Instruments  is  providing  the  ASC  with  a Scientific  Program 
Library  (SPL)  which  should  contain  standard  matrix  operations,  e.g., 
solution  of  linear  systems,  eigenvalue  extraction,  etc.  However,  the 
exact  composition  of  the  SPL  is  still  undecided.  Presumably  the  routines 
in  the  SPL  will  take  very  good  advantage  of  the  ASC's  pipeline  processor, 
which  gives  the  ASC  its  very  high  speed  compared  with  third  generation 
computers.  What  is  unknown  at  present  is  whether  or  not  these  routines 
will  contain  some  of  the  options  which  are  standard  in  NASTRAN,  e.g., 
decomposition  with  spill,  unsymnetric  decaiposition,  several  methods  of 
eigenvalue  extraction,  complex  decomposition,  etc.  If  the  SPL  does  not 
contain  some  of  these  facilities,  the  appropriate  routines  could  be 
lifted  from  NASTRAN  with  some  relatively  modest  modification.  However, 
such  routines  may  not  take  good  advantage  of  the  pipeline  processor  with- 
out a major  overhaul. 

Q.  How  do  costs  for  such  operations  on  the  CDC  6000  and  the  TIASC  compare? 
A.  Before  the  costs  of  corona  tat ion  for  the  two  machines  can  be  compared, 
a number  of  assumptions  must  be  made.  We  will  assume  that,  at  the  lowest 
priority  on  the  DTNSRDC  CDC  6000,  a job  which  uses  140K  central  memory 
(which  would  mean  only  a relatively  small  NASTRAN  matrix  if  spilling 
operations  are  to  be  avoided)  will  cost  approximately  12  cents  per  CPU 
second  or  $432  per  CPU  hour.  (The  actual  cost  will  be  higher  due  to 
charges  for  10  and  other  miscellaneous  items.)  On  the  NR L TIASC,  the 
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anticipated  lowest  priority  rate  is  $1000  per  CPU  hour.  Charges  for  other 
items,  if  any,  e.g.,  central  memory,  10,  etc.,  are  not  yet  formulated. 

To  complete  the  set  of  assumptions , we  must  consider  speed  compari- 
sons. The  anticipated  speed  of  the  ASC  is  four  times  that  of  the  I EM 
360/91.  According  to  NASTRAN  time  estimates  for  the  deconposition  of  a 
real,  symnetric  matrix,  with  no  spill,  the  IBM  360/91  is  eight  times 
faster  than  a CDC  6600,  which,  in  turn,  is  three  times  faster  than  the 
CDC  6400.  (Charge  times  on  all  DTNSRDC's  6000  conputers  are  in  terms  of 
seconds  on  the  CDC  6400.)  Therefore,  under  this  set  of  assumptions,  for 
the  type  of  matrix  deconposition  described  (which  is  probably  the  type 
most  comnonly  used),  the  TIASC  is  approximately  96  times  faster  than  the 
DTNSRDC  CDC  6000  conputers.  With  the  assumed  charge  rate,  the  ASC  will 
be  one-fortieth  as  expensive  as  the  CDC  6000.  Note  carefully,  however, 
that: 

(1)  The  CDC  rate  is  based  on  minimum  NASTRAN  central 
memory.  For  very  large  matrices,  the  full  300K  central 
memory  is  probably  required,  which  increases  the  rate  to 
$576  per  CPU  hour. 

(2)  The  ASC  rate  assumes  no  charge  for  central  memory. 

The  equivalent  CDC  rate  would  be  $288  per  CPU  hour,  which 
brings  the  cost  factor  down  to  28,  in  favor  of  the  ASC. 

(3)  These  comparisons  do  not  include  the  costs  for 
converting  a matrix  on  the  CDC  6000,  with  its  60-bit 
word  in  NASTRAN  packed  format,  to  an  ASC  64-bit,  double- 
precision word.  Nor  do  they  include  binary  to  ASCII 
conversions,  transferring  the  data,  unpacking  the  matrix, 
and  then  reversing  the  procedure  to  bring  the  results  back 
to  DTNSRDC. 

(4)  It  is  anticipated  that  later  upgrading  of  the  ASC 
will  increase  the  speed  factor  of  the  ASC  to  the  range 
of  7 to  10  times  that  of  the  IBM  360/91. 

Therefore,  while  the  ASC  is,  in  theory,  a much  faster,  much  less  expensive 
computer  for  the  types  of  calculations  considered,  the  final  cost 
conparisons , in  practice,  are  still  unknown. 


Q.  Are  there  any  other  advantages  to  transferring  such  operations  to 
the  ASC? 

A.  Yes.  Because  of  the  ASC's  increased  speed  and  a central  memory 
approximately  four  times  larger  than  that  of  the  CDC  6000,  operations  on 
a very  large  matrix,  which  would  tie  up  an  entire  CDC  machine  for  long 
periods  of  time,  could  be  performed  in  just  a few  seconds  on  the  ASC. 
(Differences  in  word  size  are  taken  into  account;  the  ASC  has  one  million 
32-bit  words  vs.  CDC's  131,000  60-bit  words.)  Therefore,  freeing  the 
conputer  resources  at  DTNSRDC  for  other  work  could  be  an  important 
advantage. 

Q.  As  a user,  what  must  I do  to  run  a NASTRAN  problem  as  contenplated? 

A.  Under  the  worst  possible  circumstances,  the  following  steps  would 
be  required: 

(1)  Run  NASTRAN  on  CDC  with  its  checkpoint  and  user  file 
capabilities  operational  and  terminate  execution  prior  to  the  time- 
consuming  operation. 

(2)  Convert  each  CDC  60-bit  binary  word  on  the  user  file  to  two 
32 -bit  ASC  binary  words. 

(3)  Convert  each  ASC  word  from  binary  to  ASCII. 

(4)  Instruct  NALCON  to  transmit  the  ASCII  file  to  the  ASC. 

(5)  On  the  ASC,  convert  each  ASCII  word  back  to  binary  format. 

(6)  Expand  matrix  from  NASTRAN  packed  format  to  unpacked  format , 
and  then  to  format  required  by  ASC  "number-crunching"  program. 

(7)  Execute  ASC  program  and  store  results  on  ASC  file. 

(8)  Convert  the  matrices  on  the  ASC  file  to  NASTRAN  packed  format. 

(9)  Convert  binary  words  to  ASCII  format. 

(10)  Instruct  NALCON  to  transmit  the  ASCII  file  back  to  the  CDC. 

(11)  Convert  each  ASCII  word  to  binary  format. 

(12)  Convert  each  pair  of  32-bit  ASC  words  to  one  60-bit  CDC  word 
and  store  on  NASTRAN- type  user  file. 

(13)  Restart  NASTRAN,  with  the  user  file  as  input,  starting  after 
the  time-consuming  operation. 
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Q.  Can  these  steps  be  automated? 

A.  Possibly.  With  CDC's  BEGIN/REVERT  capability,  a set  ( * control 
cards  can  be  catalogued  and  called  into  the  control  card  record.  There- 
fore, a complete  set  of  control  cards  could  be  set  up  foT  users  and  called 
in  by  an  individual  user  with  one  or  two  cards.  However,  th?r  method 
assimes  that  steps  4-10  above,  those  steps  which  involve  NALOON's  file 
transfer  and  RJE  capabilities,  can  be  performed  from  the  CDC  control  card 
and/or  input  records.  It  is  not  presently  known  whether  this  procedure 
will  be  possible. 

Q.  What  other  alternatives  exist  to  accomplish  the  type  of  operations 
discussed? 

A.  The  time,  cost,  and  reliability  of  transferring  very  large  matrices 
over  telephone  lines  through  the  network  are  probably,  at  present,  the 
major  problem  areas  which  must  be  addressed  before  such  a transfer  can 
became  a reality.  Therefore,  if  the  reasons  for  such  data  transfers  are 
primarily  lower  cost  and/or  freeing  DTNSRDC  computer  resources,  rather 
than  technological  advances  in  conputer  sharing  through  networks,  then 
two  other  alternatives  are  immediately  obvious: 

(1)  Because  of  the  relative  proximity  of  DTNSRDC  and  NRL,  a 
conputer  tape  could  be  brought  to  NRL  for  processing,  and 
the  results  could  also  be  stored  on  tape  and  returned. 

(2)  The  conplete  NASTRAN  program  could  be  converted  to  operate 
on  the  ASC,  and  input  data  and  output  results  could  be 
transferred  through  the  NALCGN  RJE  capability. 

Q.  What  is  involved  in  converting  NASTRAN  to  the  ASC? 

A.  One  of  the  major  possible  stunbling  blocks  in  converting  NASTRAN  to  a 
new  conputer  is  the  lack  of  a linkage  editor  with  which  to  assemble 
NASTRAN  object  decks  into  an  executable  program.  This  problem  existed  on 
the  CDC  6000  and  on  the  Honeywell  6000.  Happily,  only  minor  problems 
may  be  expected  on  the  ASC,  which  has  an  IBM- type  linkage  editor.  There- 
fore, conversion  to  the  ASC  would  probably  require  approximately  one 
man-year  of  work  plus  required  education  in  such  ASC  areas  as  the  FORTRAN 
conpiler,  ASC  assembly  language,  and  linkage  editor.  If,  however,  this 
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assumption  concerning  the  linkage  editor  is  incorrect,  a significantly 
higher  level  of  effort  would  be  required.  The  resulting  program  would 
probably  contain  a number  of  operations  in  FORTRAN  rather  than  in 
machine-dependent  assembly  language  as  they  exist  in  the  CDC,  IBM,  and 
UNIVAC  versions.  NASTRAN's  I/O  package, called  GINO,  is  the  primary 
exanple.  A major  unknown,  however,  is  the  efficiency  of  the  overall 
execution  of  NASTRAN  on  the  ASC  as  conpared  to  its  efficiency  on  the 
CDC  6000  since  most  of  the  program's  operations  are  not  of  the  "mmtoer- 
crunching"  variety.  NASTRAN's  major  mathematical  routines,  however, 
could  be  replaced  by  similar  routines  from  the  SPL,  hopefully  with 
relatively  modest  modification,  to  increase  overall  speed. 

Any  effort  to  convert  NASTRAN  to  the  ASC  will  be  greatly 
facilitated  by  NALCON's  RJE  capability  and  could  be  greatly  hindered  by 
the  usual  problems  encountered  with  a new  computational  facility. 

SUWARY  AND  CONCLUSIONS 

The  transfer  of  NASTRAN  data  from  DTNSRDC's  CDC  6000  to  NRL's 
TIASC  for  lengthy  numerical  computations  and  the  return  transfer  of  the 
results  have  been  studied  with  regard  to  procedures,  problems,  costs, 
advantages,  and  alternatives.  While  a significant  amount  of  processing 
may  have  to  be  performed  on  the  data  to  accomplish  such  a transfer,  the 
transfer  does  seem  feasible.  At  present,  however,  the  major  problem 
area  seens  to  be  the  transfer  itself,  since  the  availability  of  a 
reliable,  cheap,  and  fast  means  of  transfer  of  very  large  amounts  of  data 
is  not  a certainty.  If  this  potential  bottleneck  can  be  overcame,  the 
advantage,  with  respect  to  cost  and  resources,  could  be  significant. 

If  there  is  one  recomnendation  to  be  made  from  the  study,  it  is  to 
attenpt  to  inplement  a file  transfer  protocol  between  the  TIASC  and 
CDC  6000  which  would  allow  binary  data  transfer.  Such  a protocol 
would  significantly  reduce  the  amount  of  data  processing  required. 
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PART  II:  EXPERIMENTATION 


Michael  E.  Golden 


The  major  task  addressed  was  that  of  obtaining  central  processor  (CP) 
times  for  NASTRAN's  real,  symmetric  matrix  decomposition.  As  this 
portion  of  NASTRAN  is  usually  the  most  CP-bound  portion  of  the  program 
for  large  problems,  it  was  hoped  that  CP  time  for  NASTRAN  on  the  TIASC 
would  be  significantly  less  due  to  its  pipeline  processing  capability. 

As  will  be  shown,  this  was  not  the  case.  A decrease  was  obtained,  but  it 
was  only  enough  to  make  the  TIASC  just  marginally  cost  effective  for  the 
standard  NASTRAN  program. 

A basic  understanding  of  a TIASC  vector  instruction  is  fundamental 
to  the  ensuing  discussion.  A vector  instruction  performs  vector  and 
matrix  operations  to  take  maxinun  advantage  of  the  pipeline  processing 
capability  of  the  TIASC.  A vector  instruction  resembles  a nested  FORTRAN 
DO -loop  but  operates  as  a single  TIASC  instruction.1  For  example,  the 
nested  loop 

DO  10  1-1,10 
DO  10  J-1,10 

10  A(I,J)  - B(I,J)  ♦ C(I,J) 

would  execute  as  one  vector  instruction  instead  of  100  scalar  instructions 
The  computer  program  on  which  the  experimentation  was  based  is  the 
simultaneous  linear  equation  solver  extracted  from  NASTRAN  Level  15  but 
with  NASTRAN's  General  Input/Output  (GIN0)  routines  extracted  from 
NASTRAN  Level  12.  Level  12  GINO,  written  in  FORTRAN,  was  used  to  avoid 
re -programing  Level  15  GINO,  which  was  written  in  coaputer  - dependent 
assembly  language.  The  major  time-consuming  portion  of  the  equation 
solver  is  the  matrix  decomposition. 


"Glossary  of  Terms  and  Partial  List  of  Ac: 
Incorporated,  Publication  Number  930200-1,  M 
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The  fast  conpile,  slow  execute  (FX)  FORTRAN  conpiler  was  used  first 
for  experimentation,  for  debugging  purposes  and  to  obtain  a fully 
executable  program.  However,  due  to  TIASC  system  software  problems,  it 
was  used  for  most  of  the  project  time  period.  These  problems  will  be 
discussed  later  in  this  report.  The  execution  time  for  NASTRAN's  real, 
symnetric  matrix  decomposition  on  the  TIASC  using  the  FX  conpiler  was 
roughly  801  of  that  for  the  same  problem  on  a CDC  6400. 

After  satisfactory  results  were  obtained  with  the  FX  conpiler,  the 
slow  compile,  fast  execute  (NX)  FORTRAN  conpiler  was  used.  More  system 
software  problems,  to  be  discussed  later,  were  encountered.  The  conpiler, 
given  standard  NASTRAN  source  code,  could  produce  no  vector  instructions; 
and  NASTRAN  therefore  could  not  use  the  pipeline  processing  capability. 

The  lack  of  vector  instructions  was  due  to  the  type  of  NASTRAN  FORTRAN 
code  presented  to  the  conpiler.  Unfortunately,  precise  restrictions  must 
be  met  in  order  for  the  NX  conpiler  to  vectorize  FORTRAN  source 
statements.  Standard  NASTRAN  source  code  did  not  meet  all  these 
restrictions.  The  execution  time  for  NASTRAN  matrix  deconposition  using 
the  NX  conpiler  with  no  vectorization  was  approximately  30%  of  that  for 
the  same  problem  on  a CDC  6400. 

When  the  NASTRAN  source  code  (the  inner  loop  of  subroutine  SDCGM>, 
which  contains  the  code  most  important  with  respect  to  CP  time)  was 
modified,  vectorization  was  obtained  using  the  NX  conpiler.  Execution 
time  dropped  by  approximately  201  over  the  non -vectorized  NX  version. 

The  final  execution  time  for  NASTRAN  matrix  deconposition  on  the  TIASC 
was  therefore  about  25%  of  that  for  the  same  problem  on  a CDC  6400. 

The  costs  given  in  the  following  two  tables  concern  only  CP 
charges.  They  are  provided  only  as  a rough  estimate  of  cost  to  the  user 
and  do  not  incorporate  such  chargeable  items  as  central  memory  usage, 
costs  for  printing  output,  etc.  CP  cost  of  the  TIASC  at  NRL  is  currently 
$720  per  CP  hour  but  will  change  to  $1000  per  CP  hour  in  the  near  future. 
The  cost  of  the  CDC  6400  runs  as  given  is  for  priority  P2,  at  $270  per 
CP  hour. 
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Table  1 suanarizes  the  CP  tines  and  costs  for  NASTRAN  on  the  TIASC 
and  for  the  same  problem  run  on  a CDC  6400.  The  problem  Involved  the 
solution  of  a set  of  simultaneous  linear  equations  of  order  1500,  with  a 
symnetric  coefficient  matrix  with  a semi -bandwidth  of  300.  Double 
precision  amputation  was  used  on  the  TIASC;  single  precision  confutation 
was  used  on  the  CDC  6400.  CDC  single  precision  has  a 48-bit  fraction, 
and  TIASC  double-precision  has  a 56-bit  fraction.  However,  double 
precision  operations  on  the  TIASC  take  no  longer  than  single  precision 
operations . 


TABLE  1 - NASTRAN  MATRIX  DEOGM’OSITIGN 
TDCS  AND  COSTS 


CDC 

TIASC 

t 

6400 

FX 

NX,  no 

vectorization 

NX  with 
vectorization 

47.4 

38.6 

14.4 

12.3 

$3.56 

$7.72 

$2.88 

$2.46 

DTNSRDC  Code  184  has  been  using  the  TIASC  experimentally  for  non- 
NASTRAN  programs  to  obtain  CP  comparisons.  With  suitable  source  code 
modification,  vectorization  was  obtained.  CP  cost  reductions  have  been 
on  the  order  of  35-40  to  1 When  coapared  to  DTNSRDC 's  CDC  6400,  making 
the  TIASC  very  cost  effective  for  straight-forward,  vectorized  FORTRAN 
routines  of  the  ’htafcer-cnajching"  variety.  Since  the  TIASC  is  advertised 
as  approximately  four  tines  faster  than  the  IBM  360/91,  experiments  were 
also  performed  on  the  IBM  360/91  at  Johns  Hopkins  Uhiversity  Applied 
Physics  Laboratory.  The  current  cost  to  DTNSRDC  users  for  that  IBM 
360/91  is  $1000  per  CP  hour  plus  surcharge. 

Table  2 gives  CP  time  in  seconds  and  cost  for  the  NRL  TIASC,  the 
mNSRDC  CDC  6400,  and  the  APL  IBM  360/91.  The  non -NASTRAN  problem  was 
the  solution  of  a set  of  300  linear  equations  in  300  unknowns  by  the 
method  of  Gaussian  elimination.  The  matrix  concerned  was  real,  full, 
and  non- symmetric.  All  results  given  in  this  table  are  for  single 
precisian  amputations. 
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TABLE  2 - MATRIX  INVERSION  TIMES  AND  COSTS 


unmodified 

subroutine 


modified 
subroutine  for 
vectorization 

The  IBM  360/91  column  in  Table  2 indicates  that  the  TIASC  derives 
its  speed  primarily  from  the  execution  of  vector  instructions.  Most 
conputer  programs  do  not  vectorize  immediately  on  the  TIASC,  so  usually 
additional  coding  must  be  added  or  existing  code  changed  to  achieve 
vectorization.  The  changes/additions  generally  take  the  form  of  DO- loop 
sinplification.  The  simplification  adds  more  scalar  code  to  the  program, 
and  the  program  should  therefore  consune  more  CP  time.  However,  the 
vectorization  achieved  more  than  compensates  for  the  code  added,  resulting 
in  a net  reduction  in  CP  time  on  the  TIASC. 

Unfortunately,  this  does  not  happen  on  a scalar  machine  such  as  the 
IBM  360/91.  The  addition  of  new  code  increases  total  job  CP  time  in  all 
cases,  as  Table  2 shows.  In  general,  a vectorized  program  which  runs 
most  efficiently  on  the  TIASC  will  not  be  the  most  cost  effective  tool 
for  computation  on  non -TIASC  hardware.  Any  program  written  for  the  TIASC 
will,  in  general,  be  a special  purpose  program  to  take  advantage  of 
vector  instructions  and  will  contain  code  which  on  any  other  computer 
would  be  considered  highly  inefficient. 

Also  note  that,  for  this  problem,  the  fastest  TIASC  time  was  only 
2.3  times  faster  than  the  fastest  IBM  360/91  time,  rather  than  the 
advertised  factor  of  4. 


CDC  6400 

IBM  360/91 

TIASC 

OPT-O 

OPT-1 

OPT- 2 

H Conpiler 

NX  Conpiler 

335.8 

142.5 

135.4 

8.8 

27.1 

$25.19 

$10.69 

$10.16 

$2.45 

$5.42 

10.3 

3.8 

$2.86 

$0.76 

PR0BLH4S 


A secondary  task  of  this  experimental  work  was  to  determine  the 
general  reliability  of  the  TIASC  at  this  time.  Many  problems,  of  varying 
importance , were  encountered  and  will  be  discussed  in  order  of  increasing 
importance.  Anticipated  problems  associated  with  total  conversion  of 
NASTRAN  to  the  TIASC  will  also  be  discussed. 

The  first  problem,  although  very  minor,  indicates  the  current  state 
of  the  TIASC  operating  system  (OS)  and  system  software.  The  problem  is 
the  loss  of  permanent  files.  On  one  occasion,  the  master  NASTRAN  source 
file  was  destroyed,  apparently  by  the  OS.  Only  the  existence  of  an 
earlier  backup  file  prevented  a total  catastrophe.  In  any  event,  one 
week  was  lost  in  returning  the  file  to  its  prior  condition. 

In  addition,  an  interactive  system  store  command  to  a permanent  file 
followed  immediately  by  a system  crash  will  totally  destroy  the  file. 

The  severity  of  this  event  can  be  minimized  to  some  extent  by  an  alternate 
store  comnand,  but  the  possibility  of  loss  still  exists. 

A second  problem  involves  a minor  difference  in  the  FORTRAN  code 
expected  by  the  TIASC  compiler  from  that  of  IBM,  CDC,  and  UNIVAC  conputers. 
Alternate  returns  from  subroutines  must  be  preceded  by  an  character, 
not  a character  as  in  IBM  and  UNIVAC  FORTRAN.  As  alternate  returns 
appear  profusely  throughout  NASTRAN,  a line  by  line  search  of  the  source 
code  is  necessitated.  Although  the  search  can  be  automated  through  a 
string  manipulation  language  such  as  SNOBOL,  the  changes  must  be 
before  NASTRAN  will  compile  properly.  The  TIASC  character  set  is  standard 
EBCDIC. 

Another  problem  lies  in  the  failure  of  the  FORTRAN  BACKSPACE 
command  to  work  properly  due  to  a bug  in  the  I/O  portion  of  the  FORTRAN 
library.  The  problem  can  be  eliminated  by  replacing  the  BACKSPACE 
coamand  with  a REWIND  and  a series  of  READ  coamands,  or  it  can  be  over- 
came by  using  TIASC  machine  dependent  code  as  described  in  recommendation 
two  at  the  end  of  this  report. 

Without  a doubt,  the  FORTRAN  compilers  present  the  most  serious 
problem  encmmtered  to  date.  Most  of  the  time  for  experimentation  was 
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spent  in  "kludging"  the  NASTRAN  source  code  to  get  around  compiler  bugs. 
Virtually  no  routine  was  left  untouched.  The  bugs  include: 

(1)  faulty  linkage  to  FORTRAN  statement  lumbers  referenced 
via  alternate  returns  if  the  alternate  return  was  the  only  reference  to 
that  statement  number  in  the  routine; 

(2)  failure  to  detect  FORTRAN  errors  such  as  unbalanced 
parentheses  and  CDC  call  statements  with  alternate  "RETURNS"  phrases  left 
in,  and  then  execution  of  this  faulty  code  (FX  compiler  only); 

(3)  failure  to  pass  externals  properly  as  subroutine  argument 
parameters;  and 

(4)  generation  of  faulty  DO- loops  code:  storing  by  the  compiler 
of  a value  in  a register,  destroying  that  value,  and  then  the  later 
assumption  by  the  compiler  that  the  value  still  existed  in  the  register 
for  use. 

Regardless  of  the  problems  encountered,  however,  the  NRL  and  TI 
personnel  were  extremely  helpful  throughout  the  experimentation  phase. 

All  the  compiler  bugs  could  be  removed  by  source  code  modification,  but 
problems  with  the  conpiler  do  exist  and  will  exist  for  the  foreseeable 
future.  All  prospective  users  of  the  TIASC  should  be  aware  of  this 
problem. 

Curiously,  the  other  non-NASTRAN  programs  run  on  the  TIASC  have 
experienced  no  conpiler  errors.  This  may  be  due  to  straightforward, 
simplistic  code  or  other  factors.  Purther  investigation  would  seem  to  be 
in  order. 
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FUTURE  PROBLEMS 


Attempts  to  convert  NASTRAN  completely  to  the  TLASC  will  immediately 
encounter  two  problems.  One  is  relatively  minor  and  can  be  easily  solved: 
that  of  the  BLOCK  DATA  subprogram  as  it  exists  an  the  TIASC.  Only  one 
such  subprogram  is  allowed;  however,  NASTRAN  has  many.  Therefore,  all 
NASTRAN  BLOCK  DATA  subprograms  will  have  to  be  merged  into  other  sub- 
routines or  made  into  dunay  subroutines.  The  second  problem,  which  may 
ultimately  render  a total  conversion  process  nearly  impossible,  concerns 
the  placement  of  FORTRAN  COfCN  statements  in  the  load  module  via  the 
linkage  editor.  The  TIASC  linkage  editor  does  not  permit  a programmer 
to  explicitly  reference  COfCN  blocks  for  arbitrary  placement  in  an  over- 
laid program  tree  structure  except  to  move  them  up  into  the  tree  into  low 
order  core.  As  a result,  NASTRAN  open  core  COfCN  blocks  cannot  easily 
be  placed  at  the  end  of  a particular  tree  branch  in  high  order  core  at  the 
end  of  the  machine  instructions. 

This  problem  may  be  solvable,  since  the  loader  loads  COfCN  blocks 
in  each  tree  branch  at  the  end  of  the  first  subprogram  in  which  they  are 
encountered,  and  in  the  order  specified  in  the  subprogram.  To  solve  the 
problem,  an  open  core  COfCN  block  must  appear  in  only  one  subprogram, 
and  that  subprogram  must  be  at  the  end  of  its  appropriate  branch  in  the 
overlay  structure.  This  solution  would  be  usable  only  through  major 
NASTRAN  source  code  modification. 

If  in  NASTRAN  two  tree  branches  use  the  same  open  core  COfCN  block 
for  passing  information  from  one  to  the  other,  this  solution  is  useless. 
The  only  possible  solution  would  be  to  modify  the  TIASC  linkage  editor 
to  handle  COfCN  block  placement,  rather  than  to  modify  NASTRAN.  Based 
on  earlier  attempts  by  Ford  Motor  Confwmy  to  modify  the  Honeywell  H6000 
linkage  editor  for  NASTRAN,  conversion  of  the  TIASC  linkage  editor  for 
NASTRAN  would  probably  take  in  excess  of  two  years.  After  a reliable 
modified  linkage  editor  was  obtained,  the  NASTRAN  conversion  would  still 
lie  ahead. 


RECOMMENDATIONS 


The  experiments  performed  indicate  that  the  standard  NASTRAN  program, 
as  a whole,  will  vectorize  only  marginally,  and  that  any  program  on  the 
TIASC  which  does  not  vectorize  to  a significant  degree  will  be  only 
marginally  cost  effective.  NASTRAN  was  originally  written  so  as  not  to 
restrict  the  size  of  problems  by  using  arrays  of  fixed  dimensions.  As  a 
result,  its  large  arrays  are  passed  through  argument  lists  or  placed  in 
COMCN  statements  to  be  overlaid  at  the  end  of  source  coding  in  a tree 
structure.  These  arrays  typically  are  given  a dimension  of  one,  and 
their  true  size  is  driven  by  NASTRAN’ s open  core  philosophy.  Unfortunately, 
the  TIASC  compiler  must  have  absolute  dimensions  on  all  arrays  at  compile 
time  in  order  to  generate  vector  instructions.  The  result  is  that: 

(1)  all  DIMENSION  and  COMDN  statements  must  be  recoded 
to  provide  true  array  sizes,  or 

(2)  all  loops  using  these  arrays  must  be  broken  into  their 
sinplest  components. 

Otherwise,  vectorization  on  the  TIASC  does  not  occur. 

The  work  described  here  has  led  to  several  recommendations  for  any 
future  work  with  NASTRAN  on  the  TIASC.  The  first  recommendation  is  a 
direct  consequence  of  the  CP  comparisons  obtained  thus  far  and  the  fact 
that  NASTRAN  will  vectorize  only  marginally.  The  standard  NASTRAN 
matrix  decomposition  code,  as  well  as  other  time  consuming  numerical 
computations,  e.g.,  eigenvalue  extraction,  should  be  replaced.  Although 
the  conplex  and  non- symmetric  real  equation  solvers  were  not  tested,  they 
generally  follow  the  form  of  the  real  symmetric  solver  tested.  These 
routines  should  be  replaced  with  a set  of  routines  which  will  handle 
large,  sparse,  banded  matrices  in-core  with  a separate  set  of  routines 
for  matrices  too  large  to  fit  in  core.  Such  code  either  already  exists 
at  DTNSRDC,  developed  by  the  Computation  and  Mathematics  Department,  or 
can  easily  be  adapted  from  existing  code.  This  task  should  take  no 
longer  than  three  man-months,  and  the  result  would  be  a code  which 
utilizes  the  TIASC  pipeline  capability.  With  a CP  speed  ratio  of  40  to  1 
as  evidenced  by  the  non-NASTRAN  studies  carried  out,  the  TIASC  is  a cost 
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effective  tool  for  solving  large  systems  of  equations  if  total  or  partial 
vectorization  of  the  source  code  is  achieved. 

Secondly,  the  level  12  NASTRAN  GINO  FORTRAN  code  currently  in  use 
with  NASTRAN  on  the  TIASC  should  be  replaced  with  machine -dependent  code, 
either  by  direct  disk  I/O  via  the  S$DIO  utility  or  by  BSAM  assembly 
language  macros.  CP  costs  associated  with  I/O  will  drop  one-third  to 
one-half.  This  task  should  take  no  longer  than  3 to  6 man-months  to 
acconf>lish. 

Prior  experience  indicates  that  a full  NASTRAN  conversion  should  take 
two  to  three  man-years.  However,  the  problems  cited  earlier  in  this 
report  could  result  in  a much  longer  period  of  time,  possibly  in  excess 
of  five  years,  for  a total  NASTRAN  conversion  to  the  TIASC. 

A final  general  recomnendation  can  be  made.  Any  groups  envisioning 
use  of  the  TIASC  for  their  programs  should  run  extensive  benchmarks  on 
their  own  computers.  Upon  conversion  of  the  programs  to  the  TIASC,  all 
program  paths  should  be  tested  exhaustively.  Conversion  is  entirely 
possible,  but  users  should  be  prepared  to  encounter  problems  with  the 
FORTRAN  conpiler,  and  to  perform  major  code  revisions  to  obtain  pipeline 
processing  or  remove  unforeseen  system  problems. 
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